Combining multiple clusterings via crowd agreement estimation and multi-granularity link analysis

نویسندگان

Dong Huang

Jian-Huang Lai

Chang-Dong Wang

چکیده

The clustering ensemble technique aims to combine multiple clusterings into a probably better and more robust clustering and has been receiving an increasing attention in recent years. There are mainly two aspects of limitations in the existing clustering ensemble approaches. Firstly, many approaches lack the ability to weight the base clusterings without access to the original data and can be affected significantly by the low-quality, or even ill clusterings. Secondly, they generally focus on the instance level or cluster level in the ensemble system and fail to integrate multi-granularity cues into a unified model. To address these two limitations, this paper proposes to solve the clustering ensemble problem via crowd agreement estimation and multigranularity link analysis. We present the normalized crowd agreement index (NCAI) to evaluate the quality of base clusterings in an unsupervised manner and thus weight the base clusterings in accordance with their clustering validity. To explore the relationship between clusters, the source aware connected triple (SACT) similarity is introduced with regard to their common neighbors and the source reliability. Based on NCAI and multi-granularity information collected among base clusterings, clusters, and data instances, we further ∗Corresponding author. Present address: School of Information Science and Technology, Sun Yat-sen University, Guangzhou Higher Education Mega Center, Panyu District, Guangzhou, Guangdong, 510006, P. R. China. Tel.: +86-13168313819. Fax: +86-2084110175. Email addresses: [email protected] (Dong Huang), [email protected] (Jian-Huang Lai), [email protected] (Chang-Dong Wang) Preprint submitted to Neurocomputing June 6, 2016 ar X iv :1 40 5. 12 97 v2 [ st at .M L ] 3 J un 2 01 6 propose two novel consensus functions, termed weighted evidence accumulation clustering (WEAC) and graph partitioning with multi-granularity link analysis (GP-MGLA) respectively. The experiments are conducted on eight real-world datasets. The experimental results demonstrate the effectiveness and robustness of the proposed methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

انتخاب اعضای ترکیب در خوشه‌بندی ترکیبی با استفاده از رأی‌گیری

Clustering is the process of division of a dataset into subsets that are called clusters, so that objects within a cluster are similar to each other and different from objects of the other clusters. So far, a lot of algorithms in different approaches have been created for the clustering. An effective choice (can combine) two or more of these algorithms for solving the clustering problem. Ensemb...

متن کامل

Shear-Flexural Interaction in Analysis of Reduced Web Section Beams using VM Link Element

Reduced web section beams in shear-yielding moment-resistant steel frames are used for energy dissipating of earthquakes. The finite element analysis indicates that failure mode of these beams are governed by the combination of shear force and flexural moment. Therefore the analysis of frames with reduced web section beams needs consideration of shear-flexural interaction in those sections. In ...

متن کامل

From Comparing Clusterings to Combining Clusterings

This paper presents a fast simulated annealing framework for combining multiple clusterings (i.e. clustering ensemble) based on some measures of agreement between partitions, which are originally used to compare two clusterings (the obtained clustering vs. a ground truth clustering) for the evaluation of a clustering algorithm. Though we can follow a greedy strategy to optimize these measures a...

متن کامل

Combining multiple clusterings using similarity graph

Multiple clusterings are produced for various needs and reasons in both distributed and local environments. Combining multiple clusterings into a final clustering which has better overall quality has gained importance recently. It is also expected that the final clustering is novel, robust, and scalable. In order to solve this challenging problem we introduce a new graph-based method. Our metho...

متن کامل

Information-Theoretic Multi-view Domain Adaptation: A Theoretical and Empirical Study

Multi-view learning aims to improve classification performance by leveraging the consistency among different views of data. The incorporation of multiple views was paid little attention in the studies of domain adaptation, where the view consistency based on source data is largely violated in the target domain due to the distribution gap between different domain data. In this paper, we leverage...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Neurocomputing

دوره 170 شماره

صفحات -

تاریخ انتشار 2015

Combining multiple clusterings via crowd agreement estimation and multi-granularity link analysis

نویسندگان

چکیده

منابع مشابه

انتخاب اعضای ترکیب در خوشه‌بندی ترکیبی با استفاده از رأی‌گیری

Shear-Flexural Interaction in Analysis of Reduced Web Section Beams using VM Link Element

From Comparing Clusterings to Combining Clusterings

Combining multiple clusterings using similarity graph

Information-Theoretic Multi-view Domain Adaptation: A Theoretical and Empirical Study

عنوان ژورنال:

اشتراک گذاری